Skip to content

fix(terminal): filter terminal query sequences from captured output#2334

Open
jpshackelford wants to merge 7 commits intomainfrom
fix/terminal-escape-filter-minimal
Open

fix(terminal): filter terminal query sequences from captured output#2334
jpshackelford wants to merge 7 commits intomainfrom
fix/terminal-escape-filter-minimal

Conversation

@jpshackelford
Copy link
Copy Markdown
Contributor

@jpshackelford jpshackelford commented Mar 6, 2026

Summary

This is a more narrowly scoped solution to #2244 than our original change
#2245 which has more risk of unintended consequences than this PR.

This fix handles queries captured from PTY output (commands run via the terminal tool) and the only known cases of real problems. SDK-side queries (e.g., Rich library capability detection) are not addressed here and would require filtering at the conversation/visualizer boundary, but they have also not been observed in the wild and do not warrant the more complex change that it would entail.

Problem

When CLI tools like gh, npm, or other progress-indicator tools run inside the SDK's PTY, they send terminal query sequences as part of their spinner/progress UI:

  • \x1b[6n - DSR (Device Status Report) - cursor position query
  • \x1b]11;? - OSC 11 - background color query
  • \x1b[c - DA (Device Attributes) query

These queries get captured as part of the command output. When the output is later displayed to the user's terminal, the terminal processes these queries and responds, causing visible garbage like:

^[[38;1R^[]11;rgb:30fb/3708/41af^G

How to Reproduce

  1. Run any SDK example that executes terminal commands
  2. Execute a command that uses progress indicators: gh pr list --repo OpenHands/openhands
  3. Observe escape code garbage appearing in the output or corrupting the shell prompt after exit

Visual Example

Before fix:

$ gh pr list
Fetching PRs...^[[6n^[]11;?
#123  Fix bug    main
^[[38;1R

After fix:

$ gh pr list
Fetching PRs...
#123  Fix bug    main

Root Cause Analysis

The escape codes are IN the captured PTY output stream, not generated by terminal responses to the SDK's own queries. When gh (or similar tools) runs:

  1. gh sends \x1b[6n to query cursor position (for spinner positioning)
  2. This query is written to the PTY's stdout
  3. The SDK captures all PTY output, including the query
  4. When displayed, the user's terminal sees the query and responds
  5. The response appears as visible garbage

Solution

Add TerminalQueryFilter class to strip terminal query sequences from captured output before returning from the terminal tool. This removes the queries at the source, so the user's terminal never sees them.

Key features:

  • Stateful filtering: Handles escape sequences that may be split across incremental output chunks (for long-running commands)
  • Preserves formatting: Only removes query sequences; colors, cursor movement, and other formatting are preserved

Filtered sequences:

  • DSR (\x1b[6n) - cursor position query
  • OSC queries (\x1b]N;?) - color queries (foreground, background, palette)
  • DA/DA2 (\x1b[c, \x1b[>c) - device attributes
  • DECRQSS (\x1bP$q...\x1b\\) - terminal state queries

Preserved sequences:

  • ANSI colors (\x1b[31m, \x1b[0m, etc.)
  • Cursor movement (\x1b[H, \x1b[5A, etc.)
  • Text formatting (bold, underline, etc.)
  • Window title and hyperlinks

Testing

# Run the unit tests
uv run pytest tests/tools/terminal/test_escape_filter.py -v

Files Changed

File Change
openhands-tools/.../utils/escape_filter.py Added TerminalQueryFilter class for stateful filtering
openhands-tools/.../utils/__init__.py Export new class
openhands-tools/.../terminal_session.py Use stateful filter instance
tests/tools/terminal/test_escape_filter.py 31 unit tests (20 original + 11 new for stateful behavior)

Design Decisions

  1. Stateful filtering: The filter maintains state across calls to handle escape sequences split across chunks. This is necessary because long-running commands surface output incrementally.

  2. Filter at source: Apply filter where output is captured (terminal tool), not where it's displayed. This is simpler and more reliable.

  3. Byte-level regex: Use compiled regex patterns on bytes for accurate escape sequence matching.

  4. Preserve formatting: Only remove query sequences that trigger responses; keep colors and cursor movement intact.

  5. Minimal scope: This fix targets only PTY output processing - SDK-side queries (e.g., Rich library) are out of scope and would need changes at the visualizer boundary.

Partially addresses: #2244


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:8f20826-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-8f20826-python \
  ghcr.io/openhands/agent-server:8f20826-python

All tags pushed for this build

ghcr.io/openhands/agent-server:8f20826-golang-amd64
ghcr.io/openhands/agent-server:8f20826-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:8f20826-golang-arm64
ghcr.io/openhands/agent-server:8f20826-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:8f20826-java-amd64
ghcr.io/openhands/agent-server:8f20826-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:8f20826-java-arm64
ghcr.io/openhands/agent-server:8f20826-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:8f20826-python-amd64
ghcr.io/openhands/agent-server:8f20826-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:8f20826-python-arm64
ghcr.io/openhands/agent-server:8f20826-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:8f20826-golang
ghcr.io/openhands/agent-server:8f20826-java
ghcr.io/openhands/agent-server:8f20826-python

About Multi-Architecture Support

  • Each variant tag (e.g., 8f20826-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 8f20826-python-amd64) are also available if needed

Filter terminal query sequences (DSR, OSC, DA, etc.) from captured PTY output
before returning from terminal tool. These queries cause the terminal to respond
when displayed, producing visible escape code garbage.

Root cause: CLI tools like `gh` send terminal queries as part of their
progress/spinner UI. When captured and displayed, the terminal processes
them and responds, causing visible garbage like `^[[38;1R`.

Solution: Add filter_terminal_queries() to strip query sequences while
preserving legitimate formatting codes (colors, bold, etc.).

Fixes: #2244

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Requires Eval Verification

Taste Rating: Code is clean and solves a real problem. However, this touches terminal output handling and needs eval verification before approval per repo guidelines.

Assessment:

  • ✅ Solves real problem (visible escape code garbage)
  • ✅ Simple, targeted solution (filter at source)
  • ✅ Comprehensive tests with good coverage
  • ⚠️ Touches terminal/stdout handling → flag for lightweight evals

The implementation is solid. Regex patterns are compiled, tests verify both removal and preservation, and the fix is applied at the right layer. Once evals confirm no regressions, this is ready to merge.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-tools/openhands/tools/terminal/terminal
   terminal_session.py1935869%95, 101, 105–107, 143–144, 180, 195–196, 237–239, 244, 247–248, 252, 258, 261, 276–278, 283, 286–287, 291, 297, 300, 320, 322, 325, 327, 343, 358, 364, 373, 376, 410, 414, 417, 420–421, 427–428, 434, 437, 444–445, 451–452, 522, 527–528, 537–539, 545–546
openhands-tools/openhands/tools/terminal/utils
   escape_filter.py431272%142–143, 149–150, 169–170, 172–173, 197–198, 200–201
TOTAL207341044849% 

Change the OSC filter pattern from matching specific codes (10, 11, 4) to
matching any OSC query (sequences ending with ;? before terminator).

This is more future-proof and catches additional query types like:
- OSC 12 (cursor color)
- OSC 17 (highlight background)
- Any other OSC queries that follow the standard format

The pattern now matches: ESC ] Ps [;param] ;? TERMINATOR
Where ;? indicates it's a query, not a set operation.

Importantly, SET operations are preserved:
- OSC 0 (window title)
- OSC 8 (hyperlinks)
- OSC 7 (working directory)

Co-authored-by: openhands <openhands@all-hands.dev>
@jpshackelford
Copy link
Copy Markdown
Contributor Author

Good catch! I've updated the OSC pattern to be more general.

Before: Matched only OSC codes 10, 11, 4
After: Matches any OSC sequence ending with ;? (the query marker)

The new pattern: ESC ] Ps [;param] ;? TERMINATOR

This catches all OSC queries:

  • ✅ OSC 10/11 (fg/bg color)
  • ✅ OSC 4 (palette)
  • ✅ OSC 12 (cursor color)
  • ✅ OSC 17 (highlight background)
  • ✅ Any future OSC query types

While preserving SET operations:

  • ✅ OSC 0 (window title) - no ;? = preserved
  • ✅ OSC 8 (hyperlinks) - no ;? = preserved
  • ✅ OSC 7 (working directory) - no ;? = preserved

Added 5 new tests to verify the behavior. See commit a499579.

Adds .pr/test_real_world.py that runs an agent with the gh command
to verify terminal query sequences are properly filtered.

Usage:
  LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
    uv run python .pr/test_real_world.py

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

@jpshackelford
Copy link
Copy Markdown
Contributor Author

Manual Testing Instructions

A real-world test script is available in .pr/test_real_world.py to verify the fix works correctly.

How to Run

# Clone and checkout the branch
git fetch origin fix/terminal-escape-filter-minimal
git checkout fix/terminal-escape-filter-minimal

# Run the test (uses All-Hands LLM proxy)
LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
    uv run python .pr/test_real_world.py

What the Test Does

  1. Creates an agent with the terminal tool
  2. Asks it to run gh pr list --repo OpenHands/openhands --limit 3
  3. The gh command sends terminal query sequences (DSR, OSC) as part of its spinner UI
  4. With the fix, these queries are filtered from the captured output

Success Criteria

Pass if:

  • NO visible escape codes like ^[[38;1R or rgb:30fb/3708/41af in the output
  • NO garbage appears on your shell prompt after the script exits
  • Colors from gh output are still visible (formatting preserved)

Fail if:

  • You see raw escape sequences in the terminal output
  • Garbage characters appear after the script completes

Without the Fix (for comparison)

To see the problem on main:

git checkout main
LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
    uv run python .pr/test_real_world.py

You should see escape code garbage like ^[[6n or ]11;? in the output.

@jpshackelford
Copy link
Copy Markdown
Contributor Author

jpshackelford commented Mar 6, 2026

I have confirmed that this solution works. Note that there is a change in what the agent displays. Since we filter out OSC queries, the cli does not render the gh spinner. I think this an acceptable limitation. (Note that we aren't preventing the spinner from displaying, but when gh doesn't get back terminal query results, it elects not to display the spinner.)

Why the gh Spinner Doesn't Render Properly

The spinner animation in gh (and similar CLI tools) relies on terminal query sequences to function correctly. Here's why:
How Spinners Work

  1. Query cursor position: The spinner sends \x1b[6n (DSR) to ask "where is my cursor?"
  2. Receive response: The terminal responds with \x1b[row;colR
  3. Overwrite in place: Using the cursor position, the spinner moves back and overwrites itself with the next frame (⣾ → ⣽ → ⣻ → etc.)
What Our Filter Does

We filter out DSR queries (\x1b[6n) from the captured output because when they're displayed to the user's terminal, that terminal responds - and the response becomes visible garbage.

The Consequence

Without the cursor position query reaching the terminal:

  • The spinner never learns where it is
  • It can't move back to overwrite itself
  • Each spinner frame may appear on a new line, or the spinner may not animate at all

Why This Is Acceptable

  1. The command still works - gh pr list executes correctly and returns results
  2. Actual output is preserved - The PR list, colors, and formatting are intact
  3. Agent context - Spinners are for human feedback during waits; the agent doesn't need visual progress indicators
  4. The alternative is worse - Without filtering, you get ^[[6n^[[38;1R garbage polluting the output

Why Filter the Query, Not the Response?

The Response Problem

When a terminal query like \x1b[6n is displayed, the user's terminal:

  1. Processes the query
  2. Writes its response to stdin (e.g., \x1b[38;1R)

Filtering the response would require:

  1. Monitoring stdin continuously - Responses arrive asynchronously, potentially long after we've returned output to the agent. We'd need to constantly drain stdin throughout the entire session.
  2. Distinguishing responses from user input - If a user types while the agent is running, their keystrokes arrive on stdin too. How do we know \x1b[A is a terminal response vs. the user pressing the up arrow? We risk eating legitimate input.
  3. Racing against echo - By the time the response arrives on stdin, the terminal may have already echoed it to the display. The visible garbage (^[[38;1R) appears because the terminal echoes the response before we can intercept it. Filtering stdin doesn't prevent the visual pollution - the damage is already done.
  4. Complex terminal mode manipulation - Reliably reading stdin without blocking, while preserving terminal state, while not corrupting user input, across different platforms... this is the path the original PR fix(terminal): filter terminal query sequences from captured output #2245 went down with flush_stdin() - 700+ lines of complexity and it did not work reliably without the OSC filtering.

Why Filtering Queries Is Better

  1. Single point of control - Filter in _get_command_output() before output is returned
  2. No response is ever generated - If the query never reaches the display terminal, there's nothing to clean up
  3. No stdin complexity - No terminal modes, no race conditions, no risk of eating user input
  4. Deterministic and testable - Simple regex on captured output

@jpshackelford
Copy link
Copy Markdown
Contributor Author

I think this is ready except that should probably test the CLI built against this version of the SDK to ensure that our approach here doesn't interfere with the TUI.

Perhaps the best course is to open a PR that will build the CI against this branch of the SDK and recruit some users to use it for a day or two.

jpshackelford added a commit to OpenHands/OpenHands-CLI that referenced this pull request Mar 6, 2026
This CLI build uses the software-agent-sdk branch from PR #2334 which
includes the terminal escape filter fix for tools like gh, npm that use
spinner/progress UI.

SDK PR: OpenHands/software-agent-sdk#2334

Co-authored-by: openhands <openhands@all-hands.dev>
@jpshackelford
Copy link
Copy Markdown
Contributor Author

It looks like testing this in the CLI is blocked until breaking change in #2133 is dealt with in the CLI unless we rebase this fix branch on v1.11.5.

@jpshackelford
Copy link
Copy Markdown
Contributor Author

This PR was merged and should address the blocker on testing with OpenHands-CLI OpenHands/OpenHands-CLI#587

@enyst enyst requested a review from xingyaoww March 9, 2026 18:03
@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Mar 9, 2026

@OpenHands Do a /codereview-roasted on this PR. Publish your review feedback as review on the PR, using your appropriate event in gh api. (not a comment, a review, you are allowed to review)

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 9, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Needs improvement

This is aimed at a real bug and the implementation stays pleasantly small, but there are two correctness holes here: the fix only applies to output coming back through TerminalSession, and the filter is stateless so split escape sequences can still leak through incremental updates. Since this also touches terminal/stdout handling, I’d want lightweight eval coverage after those are addressed.

Verdict: not ready as the claimed fix for #2244 yet.

Key insight: sanitizing after the stream has already been split into per-observation chunks is too late; either the sanitization needs carry-over state, or part of the fix has to live at the actual SDK terminal boundary rather than only in the terminal tool output path.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 9, 2026

Final summary:

  • Reviewed PR fix(terminal): filter terminal query sequences from captured output #2334 against main, the PR description, and linked issue #2244.
  • Submitted a GitHub PR review using the proper COMMENT review event via gh api (not a regular comment).
  • Left two substantive inline concerns:
    1. the fix only sanitizes output in TerminalSession, so it does not fully cover the SDK-side leakage described in #2244
    2. the filter is stateless, so escape sequences split across incremental output chunks can still leak through

Checklist:

  • Performed a roasted code review of the PR
  • Published feedback as a review, not a comment
  • Focused on important correctness issues only
  • Made no code changes
  • No extraneous local modifications to push

Because I did not change repository files, there was nothing to push to the remote branch.

@all-hands-bot
Copy link
Copy Markdown
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @jpshackelford, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

jpshackelford and others added 3 commits March 19, 2026 20:21
Addresses review feedback:

1. Narrowed scope: Added documentation clarifying this fix only handles
   PTY-emitted queries (from commands run via terminal tool), not SDK-side
   queries (e.g., Rich library capability detection). SDK-side queries
   would need filtering at the conversation/visualizer boundary.

2. Made filter stateful: Created TerminalQueryFilter class that tracks
   incomplete escape sequences across chunks. This handles the case where
   long-running commands surface output incrementally and a query sequence
   may be split across multiple observations.

Changes:
- escape_filter.py: Added TerminalQueryFilter class with reset(), filter(),
  and flush() methods. filter() holds incomplete sequences at chunk end,
  prepends them to next chunk, ensuring split queries are detected and
  removed. Kept stateless filter_terminal_queries() for backward compat.
- terminal_session.py: Uses TerminalQueryFilter instance, calls flush()
  when command completes, and reset() before next command.
- Added 11 new tests for stateful filtering scenarios.

Co-authored-by: openhands <openhands@all-hands.dev>
@jpshackelford jpshackelford marked this pull request as draft March 20, 2026 21:22
@jpshackelford jpshackelford marked this pull request as ready for review March 20, 2026 21:22
@jpshackelford jpshackelford requested a review from enyst March 20, 2026 21:22
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Requires Eval Verification

Code quality is solid: real bug fix with clean stateful filtering, well-tested (31 tests), all previous concerns addressed.

Eval gate: Touches terminal output handling → requires lightweight eval verification per repo policy before approval. Code is merge-ready once evals confirm no benchmark impact.

Verdict: ✅ Flagging for maintainer to run evals

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUMAN: WDYT?


(Openhands-GPT-5.4)

The scope/rationale looks good to me — the narrower “filter at the PTY source” approach is a lot saner than the broader earlier attempt — but I wouldn’t merge it as-is.

Overall assessment: good direction, but there’s one real correctness hole in the stateful filter. Since this changes terminal-output behavior, I’d also treat it as eval-risk after fixing.

Security / correctness

  • [openhands-tools/openhands/tools/terminal/utils/escape_filter.py, Lines 76–81, 145–153] 🐛 DECRQSS queries still leak if the chunk split happens between the ESC and \\ of the ST terminator.
    _INCOMPLETE_ESC_PATTERN only buffers \x1bP[^\x1b]*$, so for a split like:

    • chunk 1: "text\x1bP$qsetting\x1b"
    • chunk 2: "\\more"

    the first call emits "text\x1bP$qsetting" instead of keeping the whole DCS pending, and the second call outputs the rest. I reproduced this locally; the final result was:

    'text\x1bP$qsetting\x1b\\more'

    instead of "textmore".

    Suggested fix: make the incomplete-DCS handling keep the entire pending \x1bP... sequence until a full \x1b\\ arrives, including the case where the chunk ends on the ESC that starts the ST terminator.

Testing

  • [tests/tools/terminal/test_escape_filter.py, Lines 246–253] 🧪 The DECRQSS coverage misses the failing split boundary.
    The current test only splits before the ST terminator. Add a regression test that splits exactly at:

    • "text\x1bP$qsetting\x1b"
    • "\\more"

    That would catch the bug above and matches the “arbitrary chunk boundary” claim in the PR description.

So: good idea, not quite correct yet. After that’s fixed, I think the scoped approach in the description is reasonable.

@all-hands-bot
Copy link
Copy Markdown
Collaborator

[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww, could you please take a look when you have a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants